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Abstract 

The abstract of "Contextual Values of Observables in Quantum Mea- 
surements" by J. Dressel, S. Agarwal, and A. N. Jordan [Phys. Rev. Lett. 
104 240401 (2010)] (called DAJ below), states: 

"We introduce contextual values as a generalization of the 
eigenvalues of an observable that takes into account both the 
system observable and a general measurement procedure. This 
technique leads to a natural definition of a general conditioned 
average that converges uniquely to the quantum weak value in 
the minimal disturbance limit." 

A counterexample to the claim of the last sentence was presented in [fQ, 
a 32-page paper discussing various topics related to DAJ, and a simpler 
counterexample in Version 1 of the present work. Subsequently Dressel 
and Jordan placed in the arXiv the paper of the title (called DJ below) 
which attempts to prove the claim of DAJ quoted above under stronger 
hypotheses than given in DAJ, hypotheses which the counterexample does 
not satisfy. The present work (Version 5) presents a new counterexample 
to this claim of DJ. 

A brief introduction to "contextual values" is included. Also included is 
a critical analysis of DJ. 



1 Introduction 

A counterexample to a major claim of 

J. Dressel, S Agarwal, and A. N. Jordan, "Contextual values of ob- 
servables in quantum measurements", Phys. Rev. Lett. 104 240401 
(2010) 

*For contact information, go to http://www.math.umb.edu/~sp 
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(henceforth called DA J) was given in [J], a 32-page paper discussing DAJ in 
detail. The claim in question is stated as follows in DAJ's abstract: 

"We introduce contextual values as a generalization of the eigenval- 
ues of an observable that takes into account both the system observ- 
able and a general measurement procedure. This technique leads to 
a natural definition of a general conditioned average that converges 
uniquely to the quantum weak value in the minimal disturbance 
limit." 

This wording (particularly, "minimal disturbance limit" ) is potentially mislead- 
ing, as will be explained briefly below, and is discussed more fully in [4]. 

Version 1 presented a simple counterexample to the claim of the above quote 
based on my interpretation of the vague presentation of DAJ. A later paper by 
Dressel and Jordan, "Sufficient conditions for uniqueness of the Weak Value" 
[5] (henceforth called DJ) adjoined new (and very strong) hypotheses to DAJ 
which the counterexample did not satisfy and claimed to prove that the above 
quote was correct under these new hypotheses. 

The present work presents a new counterexample to that claim. It also 
includes the introduction to the main ideas of DAJ of Version 1 and a critical 
analysis of DJ. 

2 Notation and brief reprise of DAJ 

To establish notation, we briefly summmarize the main ideas of DAJ. The nota- 
tion generally follows DAJ except that DAJ denotes operators by both boldface 
and circumflex, e.g., M, but we omit the boldface and "hat" decorations. Also, 
we use Pf to denote the operator of projection onto the subspace spanned by a 
vector /. (DAJ uses E { f 2) .) 

When we quote directly an equation of DAJ, we use DAJ's equation number, 
which ranges from (1) to (10), and also DAJ's original notation. Other equations 
will bear numbers beginning with (100). 

Suppose we are given a set {Mj} of measurement operators, where j is an 
index ranging over a finite set. We assume that the reader is familiar with 
the theory of measurement operators, as given, for example, in the book [5] of 
Nielsen and Chuang. By definition, measurement operators satisfy 

Y^MjM^I , (100) 

3 

where I denotes the identity operator. With such measurement operators is as- 
sociated the positive operator valued measure (POVM) {Ej} with Ej := M\M r 
When the system is in a (generally mixed) normalized state p (represented as a 
positive operator of trace 1), the probability of a measurement yielding result 
j is Tr [MjMjp] = Tr [Ejp]. Moreover, after the measurement, the system will 
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be in (unnormalized) state MjpM-, which when normalized is: 



normalized post-measurement state = 



(101) 



Tr [M jP M]} 



For notational simplicity, we normalize states only in calculations where the 
normalization factor is material. 

We also assume given an operator A, representing what DA J calls "the 
system observable" in the above quote. We ask if it is possible to choose real 
numbers ctj , which DA J calls contextual values, such that 



This will not always be possible, but we consider only cases for which it is. 
When it is possible, it follows that the expectation Tr [Ap] of A in the state p 
equals the expectation calculated from the probabilities Tr [Ejp] obtained from 
the POVM {Ej}, with the numerical value etj associated with outcome j: 



The book [B] of Wiseman and Milburn defines a measurement to be "min- 
imally disturbing" if the measurement operators Mj are all positive (which 
implies that they are Hermitian)0 DA J uses a slightly more general definition 
to define their "minimal disturbance limit" of the above quote. We shall use the 
definition of Wiseman and Milburn [6] because it is simpler and sufficient for 
our counterexample. A counterexample under the definition of Wiseman and 
Milburn will also be a counterexample under any more inclusive definition, such 
as that of DA J. 

A particularly simple kind of measurement is one in which there are only 
two measurement operators, Pf and I — Pf. Intuitively, this "measurement" 
asks whether the (unnormalized) post-measurement state is Pt or not. Here we 
are using the notation of mixed states. Phrased in terms of pure states, and 
assuming that the pre-measurement state p is pure, the measurement determines 
if the post-measurement state is the pure state / or a pure state orthogonal to 



Suppose that we make a measurement with the original measurement op- 
erators Mj and then make a second measurement with measurement operators 
Pf, I — Pf. In this situation, the second measurement is called a "postselec- 
tion", and when it yields state Pf, one says that the postselection has been 
"successful" . 

1 This is a technical definition which can be misleading if one does not realize that normal 
associations of the English phrase "minimally disturbing" are not implied. Further discussion 
can be found in [6] and [4]. 




(102) 




(103) 
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Such a compound measurement may be equivalently considered as a single 
measurement with measurement operators {PfMj, (I — Pf)Mj}. "Successful" 
postselection leaves the system in normalized state 

(PfMMPfMjV (104) 



Tr [(P f MMPfM 3 )n ' 

which is pure state / (Pf in mixed state notation). This result will occur with 
probability p(j,f) = Tr [(PjMtfPfMjp] = Tr [M)P f M jP ]. 

The probability p(j\f ) of first measurement result j given that the postse- 
lection was successful is: 

p(j,/) Tr [M]P,M t p\ 

Pbl/ ^CTI7)^ E.T,[Mtp / M,rf ' <105) 

Hence, if we assign numerical value ctj to result j as above, the conditional 
expectation of the measurement given successful postselection is: 



Sj "jTr [M^PfMjp ] 
Ei Tr [M$P f M iP ] 



This is DAJ's "general conditioned average". Written in DAJ's original nota- 
tion, this reads 



DAJ's theory of contextual values was motivated by a theory of "weak mea- 
surements" initiated by Aharonov, Albert, and Vaidman [TU] in 1988. Intu- 
itively, a "weak" measurement is one which negligibly disturbs the state of the 
system. This can be formalized by introducing a "weak measurement" parame- 
ter g on which the measurement operators Mj — Mj(g) depend, and requiring 
that 

M j {g)pM]{g) 

lim — i = p for all p and j , (107) 

s-o Tr [Mj(g)pMj(g)] 

This says that for small g, the post-measurement state is almost the same as the 
pre- measurement state p (cf. equation (|104j) ). We shall refer to this as "weak 
measurement" or a "weak limit" . 

The "minimal disturbance limit" mentioned in the above quote from DAJ's 
abstract presumably refers to (|107j) combined with their generalization of Wise- 
man and Milburn's "minimally disturbing" condition that the measurement 
operators be positive, and this is the definition that we shall useH 



2 DAJ only partially and unclearly defines its "minimally disturbing" condition, but in a 
message to Physical Review Letters (PRL) in response to a "Comment" paper that I sub- 
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DAJ claims that in their "minimal disturbance limit" (which is implied by 
a weak limit with positive measurement operators), their "general conditioned 
average" f(A) (6), our (|106l) . is always given by: 

l/2Tr [P f {A,p}\] 
f{A) = "Tr~\Pfp\ ' (1 ° 8) 

Our equation (|108l) is equation (7) of DAJ: 

. (7) 

2Tr [Eyp] 

Here A w is their notation for "weak value" of 

The statement of DAJ quoted in the Introduction, that their 

"... general conditioned average . . . converges uniquely to the quan- 
tum weak value in the minimal disturbance limit" , 

implies that for a weak limit of positive measurement operators, their (6) always 
evaluates to (7), or in our notation, our (|106[) always evaluates to (|108[) . We 
shall give an example for which (|106[) does not evaluate to (|108p . 



3 The counterexample 
3.1 General discussion 

We are assuming the "minimal disturbance" condition that the measurement 
operators be positive, so in the definition (I106P of DAJ's "general conditioned 
average", we replace M- with Mj. First we examine its denominator. 
Let 

r, 3 {g) :=Tr [M^g) pMj(g)} > (109) 

which are inverse normalization factors for the unnormalized post-measurement 
states Mi(g)pMi(g) (cf. (jlOlj) . We shall assume that all rjj(g) are bounded for 

mitted, the authors of DAJ confirmed that Wiseman and Milburn's definition implies theirs. 
DAJ uses but does not define the phrase "weak limit", but in the same message to PRL, 
the authors state that (1 1 071) corresponds to "ideally weak measurement" . Since "ideally weak 
measurement" must be (assuming normal usage of syntax) a special case of mere "weak mea- 
surement", our counterexample which assumes d 1 07 ft will also be a counterexample to the 
statement of DAJ quoted in the introduction. 

I have made several direct inquiries to the authors of DAJ requesting a precise definition of 
their "minimal disturbance limit", but all have been ignored. 

3 In the traditional theory of "weak measurement" initiated by |10| . the weak limit (i.e., 
lim s _)o) of (| 1 06 ft (cquivalcntly, (6)) would be called a "weak value" of A, though the traditional 
"weak measurement" literature calculates it via different procedures. When p is a pure state, 
most modern authors calculate this weak value as l|108|l (equivalently (7)), though the seminal 
paper |10| arrived (via questionable mathematics) at a complex weak value of which 11108ft is 
the real part. (Only recently was it recognized that "weak values" are not unique [7] [8] [9].) 
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small g, which is expected (because we expect Mj(g) to approach a multiple of 
the identity for small g in order to make the measurement "weak" ) and will be 
the case for our counterexample. We have 

lim^Tr [P/M,- (g)pM j (g)} = 

3 

l^Tr [P f ^-—^ P j}nM 

+ hm VTr [P f p} Vj (9) 

3 

= lim £ Tr [P fP ] Tr [Mj (g)pM j (g)] 

3 

= Tr [P f p] lim Tr []T Mj (g)M J (g)p] 

g^Q 

3 

= Tr[P /P ] , (110) 

because ^2,M 2 = J^MjMj — I and Tr [p] = 1.. This is the denominator 
of DAJ's claimed result (|108|) (half the denominator of their (7) because both 
numerator and denominator of our (|108|) differ from (7) by a factor of 1/2). 

Next we examine the numerator of the "general conditioned average" (|106|) . 
We shall write it as a sum of two terms, the first term leading to DAJ's (jlOSp . 
and the second a term which does not obviously vanish in the limit g — > 0.. The 
counterexample will be obtained by finding a case for which the limit of the 
second term actually does not vanish. 

Note the trivial identity for operators M, p: 

MpM = M[p, M] +M 2 p 

and the similar 

MpM = -[p, M] M + pM 2 . 

Combining these gives 

MpM=~{M 2 ,p} + ±[M,[p,M}} . (Ill) 

Using (jllip and the contextual value equation (|102[) , A = J2j a jEj = J2j a jM 2 , 
we can rewrite the numerator of (|106[) as 

numerator of (JTUSJ) = X/ 1 ' l r M J',M ,p 

3 

= l P f M 3P M j] (H2) 

3 

= \Tr[P f {A,p}]+Y,\ a 3^[Pf[M ] ,[p 1 M J \] . 

3 
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After division by the denominator of (|106p . the first term gives DAJ's claimed 
(7) in the limit g — > 0, our (|108l) . and the second term gives 

difference between weak limit of (6) and (7) = 



lim 



E^G^Tr [Pf[Mj(g), [p,M 3 {g)Y 



Tr [P f p] 



(113) 



We shall call (|113p the "anomalous term". Since there is no obvious control 
over the size of the &j(g), a counterexample is expected, but was surprisingly 
hard to find. 

The Version 1 counterexample for the quoted claim of DAJ and the newer 
counterexample for the new claim of DJ are identical up to this point. The 
difference is that the Version 1 counterexample used three 2x2 diagonal ma- 
trices as measurement operators, resulting in a contextual value equation (|102|1 
with multiple solutions, whereas the newer counterexample uses three 3x3 
diagonal matrices for which there is a unique solution to (|102[) . The newer 
counterexample could supercede the Version 1 example, but we retain the orig- 
inal counterexample because of its simple and intuitive nature (e.g., all steps 
can be mentally verified). 



3.2 The Version 1 counterexample 

The "system observable" A for the counterexample will correspond to a 2 x 2 
matrix 

" a 
b 



A 



(114) 



There will be three measurement operators: 



Mi (5) := 
M 3 (g) := 



1/2 + g 
1/2 -g 



M 2 (g) := 



1/2 -g 
1/2 + g 



(H5) 







Vl/2 - 2. 9 2 

y/l/2-2g 2 



Note that M${g) is uniquely defined by the measurement operator equation 
Ej=i Mj(g) — 1 and that all three measurement operators approach multiples 
of the identity as g — > 0, which assures weakness of the measurement. Note also 
that Ms{g) is actually a multiple of the identity for all g, so the commutators 
in the expression (|113l) for the anomalous term which involve M3 vanish. That 
is, M3, and hence 0:3, make no contribution to the anomalous term. 

Writing out the contextual value equation (|102[) in components gives two 
scalar equations in three unknowns: 



(1/2- 
(1/2- 



g) 2 ai{g) 
g) 2 (*i(g) 



(1/2 
(1/2 



s) 2 "2(g) 



(1/2 
(1/2- 



2.9> 3 (5) 
2<?> 3 (<?) 



(116) 
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The solution can be messy because of the algebraic coefficients. However, for the 
case a = 1 = 6, a solution can be obtained without calculation. This choice of a 
and b corresponds to the system observable being the identity operator, so the 
measurement is not physically interesting, but it gives a mathematically valid 
example with minimal calculation. Later we shall indicate how counterexamples 
can be obtained for other choices of a and b from appropriate solutions of (| 1 16[) . 
Assuming a = 1 = b, the system (|116l) can be rewritten 

(1/2 + g) 2 ai (g) + (1/2 -g) 2 a 2 (g) = 1 - (1/2 - 2g 2 )a 3 (g) (117) 
(1/2 - g) 2 o 1 (g) + (1/2 + g) 2 o 2 (g) = 1 - (1/2 - 2g 2 )o 3 (g) . 

We will think of 013(g) as a free parameter to be arbitrarily chosen, and as noted 
previously, the choice will not affect the anomalous term ()1 13[) . 

Viewed in this way, ()117[) becomes a system of two linear equations in two 
unknowns which become the same equation if 02 = cxi, with solution 



012(g) = 011(g) 



1 - (1/2 - 2g 2 )a 3 (g) 1 - (1/2 - 2g 2 )o 3 (g) 



(1/2 + 5 )2 + (1/2 -.9)2 



l/2 + 2 5 2) 



(118) 



Since a 3 can be chosen arbitrarily, also a 2 — a\ can be arbitrary; we shall 
choose o 3 (g) so that 

02(g) = ai(g) = ■ (119) 
9 

To see that this solution will produce a counterexample, note that for 



P11 P12 
P21 P22 



and for any diagonal matrix 



D 



di 
d 2 



[D, P ] 



(d 2 - d 1 ) 2 p 2 i 



[D,[D,p]] = 
In particular for j = 1,2, 

[Mj(g), [Mj(g),p]] = 



(d 2 






- di)p 2 i 



(di 



(d x 



- d 2 )pi2 




d 2 ) 2 pi2 



4 5 2 pi2 







and 



and since M 3 (g) is a multiple of the identity, [M 3 (g),p] = 0. Hence (I113[) be- 
comes: 



-(l/2)Tr [Pf^^lMjjg), [Mjjg^p] 
Tr [P f p 



-Tr [Pf 





4p 2 i 
Tr [P fP ] 



4/012 





(120) 
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This is easily seen to be nonzero for pi2 7^ and appropriate Pf. For a norm 1 
vector / := (/i,/ 2 ) 

,, .. f(a , Tr [P f {A,p}} -mjjhp^) noU 

weak limit of 6 = J + -— \~~TT7 12 ■ 121 

2Tr [P/pJ l/irpii + 25R(./ 2 *./iP2i) + \h\P22 

The counterexample just given assumed that the system observable A := 
diag{a, b} was the identity to make the calculations easy, but counterexam- 
ples can be obtained for any system observable. For example, if A is the 
one-dimensional projector A := diag{l,0} , and if system (|117[) is solved with 
ati(g) := 1/g 2 , then 012(5) = l/ff 2- l/(2ff), and the weak limit of the anomalous 
term is the same as just calculated for A = I. [?] 

DJ [5] adds additional (very strong) hypotheses to those of DAJ which the 
counterexample just given does not satisfyo Assuming these additional condi- 
tions, DJ attempts to prove that (6) evaluates to (7) in their "minimal distur- 
bance limit" . The next sections will present a more powerful counterexample 
which disproves this new claim of DJ. 

Originally a new paper with the more powerful counterexample was submit- 
ted to the arXiv, but a moderator rejected it. He thought that instead, Version 
1 should be modified. Rather than waste time on a lengthy and unpleasant 
appeal, I decided that it would be easier to do that. 

The paper to this point is Version 1. The sections following comprise essen- 
tially the rejected arXiv paper, which presents the more powerful counterexam- 
ple and critically analyzes DJ. 

The new counterexample is fairly simple, utilizing three 3x3 matrices, but 
not as intuitive as one would like. It was found by analyzing the properties that 
measurement operators might have in order that (7) could be shown false, and 
then playing with parametrized 3x3 measurement operators, trying to adjust 
the parameters so that (7) would not hold. The Version 4 counterexample is 
simpler and more powerful than the Version 2 counterexample. No doubt even 
simpler counterexamples could be found. Besides the new counterexample, we 
attempt to clarify some statements in DJ which we think might be misleading. 



4 The new additional hypotheses for the claim 
that (6) implies (7) in the "minimal distur- 
bance limit" 

Section 5 of DJ lists several additional assumptions, the most important of 
which are: 

1. The Mj commute with each other and A (so they can all be represented 
by diagonal matrices). 

This is a strong assumption. It is hard to imagine how it could reasonably 

4 However, the fact that these additional conditions cannot reasonably be inferred from 
DAJ is not made clear by DJ, and a casual reader might well obtain the opposite impression. 
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be inferred or even guessed from DAJ. The closest reference in DAJ to 
something similar is the following. 

"To illustrate [emphasis mine] the construction of the least re- 
dundant set of [contextual values], we consider the case when 
{Mj} and A all commute." 

Nothing is said about this being a general assumption for the rest of the 
paper. Indeed, such an assumption would seriously restrict the applicabil- 
ity of the following definition (6) of "general conditioned average" f(A), 
which requires no such assumption. I studied DAJ for months without 
ever being led to even consider the possibility that this might be an as- 
sumption for the general claims of its abstract. 

2. The contextual values a = (ai, . . . , ocn) are obtained from the eigenvalues 
a = (oi, . . . , a m ) of A = diag(ai , . . . , a m ) as 

a = F {+) a 

where F is an N x m matrix satisfying Fa — a and F^ its Moore- 
Penrose pseudo-inverse. The Version 1 counterexample does not satisfy 
this condition. 

Relying only on what is written in DAJ, it would be very hard for a reader 
to guess that this is supposed to be a hypothesis for (6), or for a claim that 
(6) implies (7), or both. (I did consider these possibilities, but rejected 
them as too unlikely, as will be explained later in more detail.) The only 
passage of DAJ which seems possibly relevant is: 

". . . we propose that the physically sensible choice of [contex- 
tual values] is the least redundant set uniquely related to the 
eigenvalues through the Moore-Penrose pseudoinverse." 

DAJ gives no reason why this should be the "physically sensible choice" . 
(DJ does attempt to address this issue, but unconvincingly and badly 
incorrectly, as will be discussed later.) Again, to assume this would seem to 
artificially limit the applicability of (6), since (6) is correct independently 
of this assumption. 

We do not list the other hypotheses for DJ's attempted proof that (6) implies 
(7) because they are more technical and less surprising than the two just dis- 
cussed. Our counterexample will satisfy all of the hypotheses listed in DJ. The 
counterexample for Version 2 has been replaced by a simpler example in Version 
4. 

5 A counterexample to the claim of D J 

Section V of DJ entitled "General Proof" attempts to show that (6) implies (7) 
under their listed hypotheses. The present section presents a counterexample 
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which satisfies all of their listed hypotheses, yet the weak limit of their "general 
conditioned average" (6) is not the "quantum weak value" (7). 

We follow identically the analysis of Section 3 through equation (|113l) . This 
time, we use a system observable 



A = 



1 





(201) 



and three measurement operators which are 3x3 diagonal matrices: 
Mi ( 5 ) := 



M 2 (g) 



M 3 (g) := 













/T72 _0 

x/172+1 











/173 







^1/6- 3- S 2 

71/6^ _0 

Vy^g 



(202) 



The contextual values a = (a±, a 2 , 0:3) satisfy Fa = a := (1, 0, 0) T with 



F = 



l/2 + 5 l/3 + . 9 2 I/6-3-3 2 

1/2 1/3 + g l/6- 5 
I/2 + 3 1/3 1/6-3 



(203) 



The matrix F is invertible with inverse (which is also equal to the Moore- Penrose 
pseudoinverse F^) 



F (+) = p- 



I-63 I-23 -I+93 

60 2 2o 60 2 

I-63 1+2 q -1+3,9 

60 2 2o 60 2 

-5-63 1+2,(7 33+5 

6ff 2 23 63 2 



(204) 



The important thing to note is that the first column, which is also (ax, a 2 , «3) T , 
is of leading order I/3 2 as 3 — > 0, which is all that the subsequent proof will 
use: 

63 -5-63 



ai(flO = "2(5) = 



1 



(205) 



63 2 ' 63 2 

The full inverse (|204[) was obtained from a computer algebra program, and the 
first column (which is all that the counterexample will use) was also checked by 
hand using Cramer's rule. 

Equations (|110l) through (11 1 3|) write the "general conditioned average" / (A) 
of (6) as a sum of two terms, one of which evaluates to (7) in the weak limit 
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g — > 0. The other term, called the "anomalous term" , is given by (|113l) as: 
difference between weak limit of (6) and (7) = 

j. E fc W(g) T r [Pf[M k (g), \pMam , 

s™ Tr [P fP ] 1 ' 

To disprove the claim of DJ, we need to show that there exists a state p and 
vector / such that the anomalous term does not vanish. 

It is well-known that the only matrix S such that for all projection matrices 
Pf, Tr [PfS] = 0, is the zero matrix S = 00 Hence it will be enough to show 
that 

lim V - \a k (g) [M k (g) , [M k (g) ,p]]^ 0. (207) 

k 

for some mixed state p such that for all nonzero vectors /, Tr [P/p] ^ 0. 

First note that for any diagonal matrix D = diag(<ii, c?2, ^3) and any matrix 

P = (p)ij, 

[D,[D,p]] ij = (d i -d j ) 3 p i j . (208) 

In the cases of interest to us, D will be one of the measurement operators 
Mk(g) , {di(g) — dj(g)) 2 = 0(g 2 ) for all and for some the leading order 
of (di(g) — dj(g)) 2 is actually g 2 . The ctk(g) all diverge like 1/g 2 as g —> 0. Thus 
we can see without calculation that we will obtain a counterexample unless some 
unrecognized relation forces the terms of ()207ll to exactly cancel^ 

That cancellation does not occur in this case can be seen with minimal 
calculation as follows. In (|208l) . take := (1,2), and note that (di — d2) 2 is 
always non-negative. When D = M${g), from the power series 



y/c + X = VC + — -= + 0(x 2 ) , 

one sees that {d\ — d2) 2 = 0(g 4 ), and since 013(g) is only 0(g~ 2 ), the k = 3 
term in (|207l) vanishes in the limit g — » 0. 

We also have ^ 

<xi(g) = a 2 (g) = 7-t - - , 

vg 9 



5 A computational proof is routine, but to see this without calculation, recall that (S,T) : = 
Tr S^T defines a complex Hilbert space structure (i.e., positive definite complex inner product) 
on the set ofnxn matrices. If (S, T) vanishes for all projectors T = Pf, then (by the spectral 
theorem), it vanishes for all Hermitian T, and hence for all T, in which case S is orthogonal 
to all elements of this Hilbert space and hence is the zero element. 

6 One useful observation that we can make from what we have done so far without detailed 
calculation is that the attempted proof of DJ is likely wrong or at least seriously incomplete, 
since that attempted proof concludes the vanishing of J 20 7 1) on the basis of order of magnitude 
arguments only. Though framed in different language, it essentially says that 112071) must vanish 
because they think that a k (g) = (F<+' (g)(l, 0, 0) T ) fc = 0(l/g) (contradicting p05jl1. while 
[M j (g),[M j (g),p]]=0(g 2 ). 
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and for either D = M 1 (g) or D = M 2 (g), 

(di - d 2 f = (g/V2f + 0{g 2 )) 2 = g 2 /2 + 0(g 3 ) 
So, in the limit g — > 0, (|207|) evaluates to 

111 P12 

Note that all we care about is that (|209|) does not always vanish, and this can be 
seen solely from the fact that a\ and a 2 have the same sign, so that the k = 1,2 
terms in (|207[) are negative multiples of pi 2 which do not vanish identically in 
the limit g — > 0. 

To finish the proof, let p be a positive definite state (i.e., all eigenvalues 
strictly positive) such that p 12 ^ 0. Such a state can be constructed by starting 
with a positive definite diagonal state and adding a small perturbation to assure 
P12 7^ (which will result in a positive definite state if the perturbation is small 
enough). Since p is positive definite, Tr [pPf] ^ for all nonzero vectors /, and 
we arc done. 



6 Discussion of DJ 

6.1 Possible error in DJ's proof 

The counterexample given above unfortunately relics on some detailed calcu- 
lation. A conceptual counterexample would certainly be preferable. A reader 
interested in discovering the truth of the matter will be faced with the unpleas- 
ant choice of wading through DJ's dense proof or checking the boring details of 
the counterexample. For such readers, it may be helpful if we point out what 
seems a potentially erroneous step in DJ's proof. 

A step which caused me to question their proof occurs at the very end of 
their Section V: 

"... to have a pole of order higher than g n [n = 1 in the counterex- 
ample] then there must be at least one relevant singular value with 
an order greater than g n . [The counterexample has a singular value 
of order g 2 .] However, if that were the case then the expansion of F 
to order g n would have a relevant singular value of zero and therefore 
could not satisfy (25) ..." 

I have not been able to guess a meaning for "the expansion of F to order g n 
would have a relevant singular value of zero" under which the last sentence 
would be true. 

6.2 Significance of the Moore-Penrose pseudo-inverse 

The original paper DAJ pQ introduced the Moore-Penrose pseudo-inverse as 
follows: 
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"... we propose that the physically sensible choice of [contextual 
values a] is the least redundant set uniquely related to the eigen- 
values [a — (ai, . . . , a m ) with A = diag(ai, . . . , a m )] through the 
Moore-Penrose pseudoinverse." 

I puzzled for a long time over this statement. Besides the fact that the meaning 
of "least redundant set" was obscure to me, they give no reason why this choice 
(which presumably means a = F^ + 'a, with the Moore-Penrose pseudo- 

inverse) should be considered the unique "physically sensible" choice, or even 
a physically sensible choice. The arXiv paper DJ which we are discussing at- 
tempts to fill this gap, but the attempt relies on erroneous mathematics and is 
unconvincing. 

Before starting the discussion of this attempt, let me remark that although 
the attempt seems partly aimed at invalidating the counterexample of [3], it 
is basically irrelevant to that aim. That counterexample is a valid mathemat- 
ical counterexample to a mathematical claim of DAJ as I imagine the vague 
exposition of DAJ would probably be interpreted by most readers. Though 
the counterexample uses a particular solution of the contextual value equation 
Fa = a, it was never claimed that this solution has any physically desirable 
properties. Though DJ does not show that the counterexample is unphysical as 
DJ claims, even if it were shown unphysical, it would still disprove the claim that 
(6) necessarily evaluates to (7) in the "minimal disturbance limit". A reader 
of DAJ cannot reasonably be expected to guess that the definition of "minimal 
disturbance limit" is supposed to include the pseudo-inverse prescription. 

Therefore, the discussion will be directed toward analyzing the claim of DJ 
that the pseudo-inverse solution should be preferred because DJ thinks (incor- 
rectly) that 

". . . the pseudo-inverse solution will choose the solution that gener- 
ally provides the most rapid statistical convergence for observable 
measurements on the system." 

A careful analysis of DJ's argument for this claim will reveal flaws which inval- 
idate it. 

DJ writes: 

"With the pseudo-inverse in hand, we then find a uniquely specified 
solution do = F^a that is directly related to the eigenvalues of the 
operator. Other solutions a = do + x of (3) will contain additional 
components in the null space of F, and will thus deviate from this 
least redundant solution. [True if sympathetically interpreted, but 
tautological.] Consequently, the solution ao has the least norm of 
all solutions ..." 

The Euclidean norm ||a|| 2 := ■ a? in the real Hilbert space R n has no physical 
significance in quantum theory. Why is it relevant that do has least norm? 
The discussion immediately following may possibly be intended to answer this, 
but when analyzed it only tautologically repeats what has already been said. 
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However, an inattentive reader could easily get the impression that something 
had been proved. 

DJ thinks that this immediately following discussion (at the bottom of the 
first column of p. 4) gives "mathematical reasons for using the pseudoinverse" , 
but in fact no convincing reason has been given. 

The next paragraph continues: 

"In addition to the mathematical reasons for using the pseudoin- 
verse in this context [referring to the discussion just analyzed, which 
doesn't give any convincing mathematical reason] , there is an impor- 
tant physical one that we will now describe. As mentioned, a fully 
compatible detector can be used together with the contextual values 
to reconstruct any moment of a compatible observable. However, 
since the detector outcomes are imperfectly correlated with the ob- 
servable, the contextual values typically lie outside of the eigenvalue 
range and many repetitions of the measurement must be practically 
performed to obtain adequate precision for the moments. Impor- 
tantly, the uncertainty in the moments is controlled by the the vari- 
ance, not of the observable operator, but of the contextual values 
themselves, [emphasis mine]" 

Consider a probability space with outcomes {1, 2, . . . , n} with probability pj for 
outcome j. A random variable v is an assignment j t— > Vj of a real number Vj 
to each outcome j. The mean v of v is defined as usual by 

3 

and the variance r 2 of v is defined by 

3 3 

Here we use the symbol t 2 instead of the customary a 1 to denote the variance 
to avoid confusion with the different a 2 defined in DJ (as the second moment). 

One can speak of the "variance" of a random variable on a classical prob- 
ability space, or of the "variance" of quantum observable measured in a given 
state. But what can it mean to speak of the "variance" of contextual values ctfl 
Contextual values are are predefined to satisfy 

A = ^2a j M}M j , (fT02l 

3 

where A is the "system observable" and {Mj} a collection of measurement 
operators. What is measured are the outcomes j. 

However, even though we know the contextual values beforehand from (|102[) . 
one might speak of "measuring" them in the following sense. To every outcome 
j corresponds a contextual value aj . A given state of the system p makes the set 
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of all outcomes j into a probability space by assigning a probability pj to each 
outcome j: pj = Tr [M^Mjp]. The assignment j i— > <Xj is a random variable 
on this probability space, and it is meaningful to speak of its "variance" . The 
subsequent analysis assumes that this is the meaning that DJ intended. This 
discussion may seem inappropriately elementary, but I was initially puzzled 
about this point, and it cannot hurt to make it explicit. 

Note that the mean a = Tr [Ap] of this random variable is the same no 
matter how the contextual values aj are chosen so long as they satisfy the 
contextual value equation (|102[) . That implies that choosing the contextual 
values so as to minimize the true variance r 2 in a given state is equivalent 
to minimizing the second moment a 2 . Note also that the mean and variance 
implicitly depend on the state p, and that there is no reason to think that one 
might be able to choose the contextual values so as to minimize the variance in 
all states. 

DJ continues: 

"Consequently, it is in the experimenters best interests to minimize 
the second moment of the contextual values, 

° 2 = J2 a fa> ( 21 °) 

3 

where pj is the probability of outcome j" 

DJ correctly identifies a 2 as the second moment, but unless read very carefully, 
the subsequent discussion could encourage confusion of a 2 with the true variance 
r 2 . 

Next DJ notes that ||a|| 2 is a (very crude) upper bound for a 2 : 

3 3 

"In the absence of prior knowledge about the system one is deal- 
ing with, this is the most general bound one can make. Therefore, 
the pseudo-inverse solution will choose the solution that generally 
provides the most rapid statistical convergence for observable mea- 
surements on the system." 

This is highly questionable. Although it may not be clear at this point, sub- 
sequent paragraphs make clear that DJ is claiming that it is legitimate to use 
||a|| 2 as a sort of estimate for a 2 , the strange and invalid justification for the 
claim being the sentences of the quote following equation (*). 

DJ's next paragraph computes ||<3(g)|| 2 for both the ct(g) used in the coun- 
terexample of [3] and for the pseudo-inverse solution cto(g) = F^(g)a, using 
||<3(<?)|| 2 as a kind of crude estimate for a 2 — <r 2 (g) = (J 2 (g,p). 

"For the case of the counterexample, the Parrott solution (13) [(13) 
should be (11)] has to leading order the bound on the variance 

2 3 3(a-6) 1 
H = — Tl - + °— )> ( 15 ) 

9 2 5 d g 2 
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while the pseudoinverse solution (11) [(11) should be (13)] has to 
leading order the bound 

IN| 2 = ^ ! + ^( a + fo ) 2 + °(5 2 )- (16) 

For any observable a, the Parrott solution has detector variance of 
order 0(1/ g 4 ) [emphasis mine], which would swamp any attempt to 
measure an observable near the weak limit. . . . However, the pseu- 
doinverse solution has a detector variance of order 0(1/ g 2 ) in the 
worst case; ..." 

What invalidates the argument is the use of the crude upper bound (*) as an 
estimate for the second moment a 2 and the subsequent claim that "the Parrott 
solution has detector variance of order 0(1 /g 4 ) ... "0 

Solely from upper bounds for two quantities, one cannot draw any reliable 
conclusions about the relative size of the quantities themselves. To see this 
clearly in a simpler context which uses essentially the same reasoning, consider 
the upper bounds 

x < x 4 and x 2 < x 3 

for real numbers x > 1. From the fact that the first upper bound x A (for x) 
is larger than the second upper bound x 3 (for x 2 ), we cannot conclude that x 
is larger than x 2 for x > 1. Yet DJ's argument relies on this type of incorrect 
reasoning. 

In the interests of following closely the exposition of DJ, we passed rapidly 
over (*). Let us return to analyze it more closely: 

^ 2 :=!>^<X> 2 = ll 5 li 2 • (*) 

3 3 

"In the absence of prior knowledge about the system one is deal- 
ing with, this is the most general bound one can make. Therefore, 
the pseudo-inverse solution will choose the solution that generally 
provides the most rapid statistical convergence for observable mea- 
surements on the system." 

Note once again that a 2 = a 2 (p) depends implicitly on the state p because the 
probabilities pj — Tr [pMjMj] of outcome j depend on p. Keeping this in mind, 
one sees how crude the upper bound (*) really is. 

For a nonzero system observable A, equality holds in (*) (i.e., a 2 (p) = ||a?|| 2 ) 
only in the trivial case in which one particular pj = 1 and the others vanish, 

7 DJ incorrectly identifies a 2 as the "variance" instead of the second moment, but this is a 
mere slip. Ignoring this slip, technically one could argue that this statement is correct because 
to say that a quantity is 0(l/g 4 ) only means that it increases no faster than 1/g 4 as g — > 0. 
For example, g s = 0(l/g 4 ). However, in the context and taking into account the typically 
sloppy use of the "big-oh" notation in the physics literature, most readers would probably 
interpret this passage as claiming that the "Parrott solution" has variance of leading order 
1/g 4 , which would be an invalid conclusion from the argument. 
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and in addition, ctj(g) = for j ^ J. That corresponds to the trivial case 
in which there is effectively only one measurement operator Mj(g) satisfying 
aj(g)Mj(g)Mj(g) = A. (The other measurement operators play the role of 
assuring that J^ - Mj Mj = I, but do not contribute to the estimation of the 
expectation of A in the state p, Tr [Ap].) Since one of DJ's hypotheses (which 
we did not discuss above) is that lim ff _>.o Mj(g) is a multiple of the identity for 
all j, also the system observable A is a multiple of the identity. 

The statement following (*), that "this is the most general bound one can 
make", seems a very strange form of reasoning. Doubtless, (*) was the most 
general bound that the authors knew how to make, but it seems unscientific to 
base an important argument on an unsupported personal belief that no one else 
can do better. 

In fact, a better bound is possible. By the Cauchy-Schwartz inequality, 

1/2 



a 1 = 22 tfPj < [ 



3 



2\2il/2 



E^ 1/2 * 



E«, 4 



since < p 3 < 1, so p 2 < p 3 and YjjPj - SjPi = 1 - Tnat 
bound than (*) when at least two ocj are nonzero follows from 



(**) 



is a better 



1/2- 



2 ,2 



< 



E« 2 



because for any collection of at least two positive numbers {qj}, X] Qj < Ej 9j] 2 - 
If the authors were to reformulate their proposal for the appropriate choice 
of the contextual values in terms of this better bound, it seems unlikely that 
DATs proposed Moore-Penrose pseudo-inverse solution would minimize (**), or 
possible bounds even better than (**). And as pointed out earlier, the physi- 
cal meaning or appropriateness of minimizing a particular upper bound for the 
detector's second moment remains obscure. 



6.3 What is the "physically sensible" choice of contextual 
values? 

In many experiments (indeed, in all experiments known to me), the system 
always starts in a known state p. For such an experiment, it seems to me that 
the "physically sensible choice" of contextual values would be the choice that 
minimizes the detector variance t 2 = r 2 (p) in that initial state p. 

It is a simple exercise to work out a necessary condition for this minimization, 
and the pseudo-inverse prescription does not necessarily satisfy it. For the 
reader's convenience, we sketch the details. 

The contextual values equation (|102[) for contextual values a — (a±, . . . , ajy) 
can always be written as a linear system given by a vector equation 



a = F{a) 



(211) 
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where a is a vector associated with the system observable A and F a matrix 
whose size will depend on the dimension of ad Given an initial state p, mea- 
surement operators Mj, and associated probabilities pj — Tr [pM^Mj], we want 
to minimize the detector variance 

T 2 (p):=J2Pi<*i- (l>«*) ■ ( 212 ) 

As noted in the preceding subsection, for a particular state p and taking into 
account the contextual value equation (|102j) , this is the same as minimizing the 
second moment 

a\p) :=£>a? . (213) 

i 

(To avoid confusion, we continue using DJ's nonstandard notation a 2 for the 
second moment instead of the variance.) 

Let a p denote a particular solution of F{a) — a. Then the general solution 
of F(a) = a is a = a p + ff with if in the nullspace Null(-F) of F, and 

a 2 := ^Pia 2 = 5>,(af ) 2 + 2^p i afr H + J^PiVl (214) 

i i i i 

For small ff, a nonvanishing linear second term will dominate the quadratic third 
term[^| and we see that if a p is to minimize a 2 , then the vector (piaf , . . . ,pAra^) 
must be orthogonal to the nullspace of F. This is the necessary condition 
mentioned earlier. 

Thus it seems to me that a "physically sensible" choice of contextual val- 
ues in this situation should satisfy this necessary condition. However, the 
pseudo-inverse solution is abstractly defined by the different condition that 
a p = (af , . . . , a£) be orthogonal to Null(F)0 

Even if the state p is not known from the start, to estimate the expectation 
of A as J^j a jTr [M^Mjp] = J^j a jPj> one needs to estimate the pj as frequen- 
cies of occurence of outcome j, so the pj can be regarded as experimentally 
determined to any desired accuracy. Given these pj , one can then choose the 
solution a to the contextual value equation F(a) — a to minimize (|214l) and 
the detector variance. This procedure for minimizing the detector variance will 
rarely result in the pseudo-inverse solution. 

8 If A or some of the measurement operators are not diagonal, then a will not be the vector 
of eigenvalues of A as in DJ. For example, if A is a general 2x2 Hcrmitian matrix, then 
a = (ai, 02,03) may be taken to be the three-dimensional vector (An , A12, A22), and in 
general, a may be formed from the components of A on or above its main diagonal. 

9 More precisely, if for some ij the linear term does not vanish, then replacing ij by xr], with 
x real, gives a quadratic function in x with nonvanishing linear term, which cannot have a 
minimum at x = 0. 

10 This is discussed but not proved in the Appendix to 4 . A formal statement and proof 
can be found in p. 9, Theorem 1.1.1. 
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6.4 Does DAJ assume that contextual values a come from 
the Moore-Penrose pseudo-inverse, a = 

We have seen that none of the reasons that DJ gives for determining contextual 
values by the pseudo-inverse construction, 

a = F {+) a , (215) 

hold up under scrutiny. DAJ doesn't give any valid reasons, either. Its "general 
conditioned average" (6) does not require this hypothesis, nor the hypothesis 
that the system observable A and measurement operators Mj mutually com- 
mute. Why assume something that is not needed? 

DJ gives the false impression that DAJ unequivocally assumes (|215[> as a 
hypothesis. For example, 

"The problem with Parrott's counterexample is that he ignores this 
discussion [of defining the contextual values by the pseudo-inverse 
prescription a := F^a] . 

The totality of this "discussion" is the single sentence: 

". . . we propose that the physically sensible choice of CV is the 
least redundant set uniquely related to the eigenvalues through the 
Moore-Penrose pseudoinverse." 

DAJ does devote a long paragraph to a complicated method of defining and 
calculating the Moore-Penrose pseudo-inverse, but that has nothing to do with 
the reasons for using the pseudo-inverse in the first place. A reference to a 
mathematical text would have sufficed and saved sufficient space to have clearly 
stated their hypotheses for (6) and for the claimed implication that (6) implies 
(7) in their "minimal disturbance limit" . If the authors don't tell us, how can 
we poor readers possibly guess that the pseudo-inverse prescription (|215p is 
assumed as a hypothesis for (6) (if in fact it is, which to this day I don't know), 
or if not, as a hypothesis for a section which follows (6), such as the "Weak 
values" section? 

When I wrote [3] giving the counterexample, I did consider the possibility 
that DAJ might possibly be assuming the pseudo-inverse solution, but rejected 
it as implausible. This was partly because they had previously sent me an 
attempted proof that their (6) implies (7) in their "minimal disturbance limit" 
which if correct (it wasn't) would have applied to any solution a, not just the 
pseudo-inverse solution. (It also would have applied even if the measurement 
operators and system observable did not mutually commute.) So, I knew to a 
certainty that when DAJ was submitted, there was no reason for the authors to 
have assumed the pseudo-inverse prescription. 

Also, in the sweeping claim of DAJ's abstract that their "general conditioned 
average" (6) 

"... converges uniquely to the quantum weak value in the minimal 
disturbance limit" , 
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by no stretch of the imagination could the reader guess that the technical 
pseudo-inverse prescription would be part of the definition of "minimal distur- 
bance limit". And if the prescription is not part of the definition of "minimal 
disturbance limit" , then to justify the claim, the prescription would have to 
be taken as part of the definition of their "general conditioned average" (6). 
But the latter alternative would artificially limit the applicability of (6), since 
(6) is correct no matter how the contextual values are chosen (subject to the 
contextual value equation (|102|) ). 

6.5 Section VI of DJ: 

The last four paragraphs of Section VI of DJ (entitled "Discussion" ) are mis- 
leading and in some ways incorrect. The reasons are given in Section 11.1 of j4] 
and will not be repeated here. 



7 Acknowledgments 

I was surprised to see in DJ the acknowledgment: "We acknowledge correspon- 
dence with S. Parrott" . That made me wonder if protocol required that I provide 
a similar acknowledgment. And if so, what should it say? Would it be proper 
to acknowledge negative contributions as well as positive ones, and if so should 
I? If I didn't, how would I explain why I didn't simply ask the authors about 
some of the questionable points in DAJ? 

The (nearly unique) positive contribution of the authors of DAJ to [4] , [3] , 
and the present work was to furnish their original argument that (6) implies (7) 
in their "minimal disturbance limit" . That argument brought to my attention 
the decomposition of equation (|111[) . which was part of their attempted proofs. 

That argument was definitely incorrect because I found a counterexample to 
one of its steps. I sent the counterexample to the authors in mid-February, but 
they never acknowledged it. I made several subsequent inquiries about other 
points in DAJ, but all were ignored. I have not heard from them since February 
19. (It is now June 23). (What little correspondence we did exchange was 
uniformly courteous.) That is why I was unable to clarify other vague points in 
DAJ such as for which results (if any) the pseudo-inverse solution was assumed 
as a hypothesis. 

I intend to eventually post on my website, www.math.umb.edu/~sp , a 
complete account of the strange aspects of this affair, which has been unique in 
my professional experience. It will raise questions about the editorial practices 
of influential journals of the American Physical Society, among other issues. 

DJ acknowledges that their work was supported by two grants, at least one 
of which was taxpayer-supported via the National Science Fountation. The 
present work was not supported by any grants, unless donation of the author's 
time might be considered a kind of "grant" . 
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If so, it is a "grant" to society in general. I have spent months trying to 
unravel DA J, mostly without any help. I submit this to the arXiv to save others 
similar time. It is painful to realize that I have largely wasted my time for a 
contribution so small, but it is satisfying to hope that the time saved by others 
may result in larger contributions than I could have made. 

Added in version 8: Version 2 of [2], arXiv:1106.1871v2 replies to the present 
work. It was was published in J. Phys. A: Math. Theor. 45 015304. The 
published version will be called DJpub below. 

I thank the authors for noting a typo in the definition of the (3,3) entry of 
the 3x3 matrix M%{g) in the Section 5 counterexample on p.ll. The original 
entry 1/3 should have been y^l/3, and this correction has been made in this 
Version 7. The original analysis assumed the correct value, so apart from this 
substitution, no changes were necessary. 

DJpub reinterprets (unjustifiably, in my view) one of the hypotheses of [5] and 
notes that the counterexample given above does not satisfy the reinterpreted 
hypothesis. An analysis of DJpub has been posted in arXiv:1202.5604, and an 
abbreviated version has been under consideration by J. Phys. A for over 10 
months (as of this writing, October 14, 2012). 
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